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HIGHLIGHTS 


• We investigate the spatial variability of the AOD-PM2.5 relationship. 

• The model-predicted PM2.5 mass concentrations are highly correlated with the actual observations (R 2 = 0 . 89 ). 

• The model captures the pollution levels along highways. 

• High accuracy of PM2.5 estimates enables to examine PM2.5 levels within cities. 
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To date, spatial-temporal patterns of particulate matter (PM) within urban areas have primarily been 
examined using models. On the other hand, satellites extend spatial coverage but their spatial resolution 
is too coarse. In order to address this issue, here we report on spatial variability in PM levels derived from 
high 1 km resolution AOD product of Multi-Angle Implementation of Atmospheric Correction (MAIAC) 
algorithm developed for MODIS satellite. We apply day-specific calibrations of AOD data to predict PM2.5 
concentrations within the New England area of the United States. To improve the accuracy of our model, 
land use and meteorological variables were incorporated. We used inverse probability weighting (IPW) 
to account for nonrandom missingness of AOD and nested regions within days to capture spatial vari- 
ation. With this approach we can control for the inherent day-to-day variability in the AOD-PM2.5 
relationship, which depends on time-varying parameters such as particle optical properties, vertical and 
diurnal concentration profiles and ground surface reflectance among others. Out-of-sample “ten-fold” 
cross-validation was used to quantify the accuracy of model predictions. Our results show that the 
model-predicted PM2.5 mass concentrations are highly correlated with the actual observations, with out- 
of-sample R 2 of 0 . 89 . Furthermore, our study shows that the model captures the pollution levels along 
highways and many urban locations thereby extending our ability to investigate the spatial patterns of 
urban air quality, such as examining exposures in areas with high traffic. Our results also show high 
accuracy within the cities of Boston and New Haven thereby indicating that MAIAC data can be used to 
examine intra-urban exposure contrasts in PM2.5 levels. 

© 2014 Elsevier Ltd. All rights reserved. 


1. Introduction 


* Corresponding author. Department of Geography and Human Environment, Tel- 
Aviv University, Israel. 

E-mail address: achudnov@post.tau.ac.il (A.A. Chudnovsky). 


Numerous studies have reported associations between mortal- 
ity and morbidity outcomes and particulate matter with aero- 
dynamic diameter <2.5 pm (PM2.5) (Dockery et al., 1993; Schwartz 
1994; Pope etal., 2002; U.S. Environmental Protection Agency (EPA) 
2004). Although ground-level PM2.5 monitoring sites provide 
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accurate measurements, their spatial coverage is limited and thus 
often insufficient to capture the PM 2.5 spatial variability for expo- 
sure and epidemiological studies. 

Satellite imagery adds another important tool to air quality and 
pollution monitoring due to its extensive spatial coverage and 
repeated observations of the earth surface and atmosphere. The 
most common parameter derived from satellite observations is the 
Aerosol Optical Depth (AOD), which is a measure of the extinction 
of electromagnetic radiation at a given wavelength due to the 
presence of aerosols in an atmospheric column. However, the 
satellite-derived AOD is a measure of light attenuation in the col- 
umn which is affected by ambient conditions (e.g., humidity, ver- 
tical profile, chemical composition etc.), while PM 2.5 is a measure of 
dry particle mass near the surface; therefore, these two parameters 
are not expected to be strongly correlated (Chudnovsky et al., 2012). 

To date, spatial patterns of particle exposure within populated 
areas have been examined by models, such as land use regression 
(Jerrett et al., 2005; Hoek et al., 2008), line dispersion models such 
as CAlifornia LINE Source Dispersion Model (CALINE, Benson, 1984, 
1992), or atmospheric chemistry and transport models such as 
GEOS-Chemical transport model (CTM) (http://geos-chem.org/). 
Indeed, it is challenging to deploy hundreds of monitors, needed 
to measure spatial variability in PM levels. On the other hand, 
satellites extend spatial coverage but their spatial resolution is 
insufficient. The Moderate Resolution Imaging Spectroradiometer 
(MODIS) onboard the Terra and Aqua satellites provide a daily 
global coverage but the conventional resolution of its aerosol 
product (10 km Dark Target (DT)) is often too coarse for suitable 
exposure estimates in urban areas. The widely anticipated 3 km 
MODIS AOD product is expected to be generated as a part of 
Collection 6 re-processing (Levy et al., 2013; Remer et al., 2013). 
Recently, a new Multi-Angle Implementation of Atmospheric 
Correction (MAIAC) algorithm was developed for MODIS which 
provides aerosol information at 1 km resolution (Lyapustin et al., 
2011 a, b). Several studies published in the last 3 years have 
shown that high spatial resolution is essential to detect spatial 
variability in PM levels (Kumar et al., 2011 ) and in aerosol loadings 
at regional and at a sub -10 km scale (e.g. intra-urban domain) 
(Emili et al., 2011, Chudnovsky et al., 2013a). In another study 
Chudnovsky et al. (2013b) showed that MAIAC provides a factor of 
1.5— 1.8 more data than the DT 10 km over New England; the 
improvement being over brighter urban surfaces and partly cloudy 
days. However authors conducted straightforward analyses of the 
AOD vs PM 2.5 association as a simple metric to compare between 
10 km and 1 km retrievals without any goal to provide PM 2.5 es- 
timates at the urban scale. 

Recently, several studies proposed that the effects of the time- 
varying parameters influencing the AOD-PM 2.5 relationship can 
be taken into account with daily adjustments, resulting in much 
higher R 2 than previously reported in the literature, ranging from 
0.83 to 0.92 (Lee et al., 2011 ; Chudnovsky et al., 2012). Furthermore, 
Kloog et al., 2011 and Kloog et al., 2012 extended the previously 
established model on day-specific calibrations of AOD data by 
incorporating Land Use (LU) and meteorological variables. The goal 
of the current study is to investigate if the improved 1 km spatial 
resolution would better explain the variations in PM 2 . 5 . We use 
MAIAC 1 km resolution AOD data from MODIS Aqua for the New 
England area to obtain daily PM 2.5 estimates for AOD retrieval days 
and explore its applicability at the intra-urban scale. By further 
developing the approach described in Kloog et al. (2011), we 
investigate the extent of spatial variability of the AOD-PM 2.5 rela- 
tionship on a daily basis and show how this variability can be 
captured by a mixed effects model approach during the period of 
2003. Finally, we present the modeled spatial pattern of PM 2.5 
levels within the study domain for selected days. 


2. Material and methods 

2.1. Ground-level PM 2.5 data 

Twenty-four hour PM 2.5 concentrations were calculated at 62 
U.S. Environmental Protection Agency (EPA) PM 2.5 monitoring sites 
during 2003 (Fig. 1 ). These include 14 sites in Maine (ME), 10 sites in 
New Hampshire (NH), 6 sites in Vermont (VT), 16 sites in Massa- 
chusetts (MA), 16 sites in Connecticut (CT) and 2 sites in Rhode 
Island (RI). Sampling frequency differed by site and included sam- 
ples collected every day, every third day, and every sixth day. 
Additionally, we used 24 h PM 2.5 concentrations from the Harvard 
School of Public Health (HSPH) supersite located near downtown 
Boston, MA. 

2.2. Satellite data 

A new algorithm MAIAC (Lyapustin et al., 2011a, b; 2012a) has 

been developed to process MODIS data. MAIAC retrieves aerosol 
parameters over land at 1 km resolution simultaneously with pa- 
rameters of a surface bidirectional reflectance distribution function 
(BRDF). This is accomplished by using a time series of MODIS 
measurements and simultaneous processing of groups of pixels. 
The MAIAC algorithm ensures that the number of measurements 
exceeds the number of unknowns, a necessary condition for solving 
an inverse problem without empirical assumptions typically used 
by current operational algorithms. The MODIS time series accu- 
mulation also provides multi-angle coverage for every surface grid 
cell, which is required for the BRDF retrievals from MODIS data. The 
aerosol parameters include optical depth, Angstrom exponent from 
0.47 to 0.67 pm, and aerosol type including background, smoke and 
dust models (Lyapustin et al., 2012b). The background models are 
specified regionally based on the climatology of the Aerosol Robotic 
Network (AERONET) (Holben et al., 1998) sun-photometer data for 
relatively low AOD days (<0.5). The smoke/dust types are identified 
as aerosols with increased shortwave absorption (Lyapustin et at, 
2012 b) and dominant fine/coarse mode particles, respectively. 
AERONET validation over the continental USA showed that the 
MAIAC and MODIS Dark Target (DT) algorithms have a similar ac- 
curacy over dark and vegetated surfaces, but also showed that 
MAIAC generally improves accuracy over brighter surfaces, 
including most urban areas (Lyapustin et al., 2011b). The improved 
accuracy of MAIAC results from using the explicit surface charac- 
terization method in contrast to the empirical surface parameter- 
ization approach, which is utilized in the DT algorithm. Further, 
MAIAC incorporates a cloud mask (CM) algorithm based on spatio- 
temporal analysis which augments traditional pixel-level cloud 
detection techniques (Lyapustin et al., 2008). In this work, the re- 
sidual contamination by clouds and cloud shadows was addition- 
ally reduced by discarding 2 pixels adjacent to detected clouds. 
Importantly, the MAIAC approach becomes indispensable in het- 
erogeneous aerosol environments, e.g. with local sources such as 
fire smoke plumes or in urban/industrial areas. 

2.3. Spatial (LU) and temporal (meteorology) predictors of PM 2.5 

Different land use (LU, spatial predictors) and meteorological 
(MET, temporal predictors) variables were examined to improve 
the predictions of PM 2.5 at the 1 km scale. Whereas land use pa- 
rameters enable us to capture traffic and point sources, meteoro- 
logical conditions can influence the AOD-PM relationship and AOD 
retrieval accuracy. For example, AOD generally increases with 
relative humidity for hygroscopic particles due to hygroscopic 
growth (Bergin et al., 2000; Altaratz et al., 2013). 
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Fig. 1 . Map of the study area showing the location of EPA monitoring sites across New England. The area highlighted by rectangle show Boston and New Haven. 


Multiple LU and MET potential predictors of PM measured at 
EPA sites within the grid cell were examined. Different sets of 
predictors were combined and the model for each set was run. The 
selection of a best model was based on the AIC score (Akaike in- 
formation criterion, a measure of the relative goodness of fit that is 
asymptotically related to out of sample prediction R 2 ) to further 
backward-select the variables in each combination. Finally, the 
combination of predictors with the minimum AIC was selected. As a 
result of this work, the model was constructed with the following 
spatial and temporal predictors: percent of the grid cell that is ur- 
ban, elevation. Normalized Difference Vegetative Index (NDVI), 
major road density within a grid, distance to PM2.5 point emissions, 
relative humidity, height of the planetary boundary layer and wind 
speed. 

2.3.1. Percent of urban areas 

We used 30 m resolution 2001 national land cover data (NLCD). 
Data were obtained as raster files with 30 m cell size from mrlc.gov. 
The ArcGIS Spatial Analyst Neighborhood Focal Statistics tool was 
used to calculate count of urban cells in 33 x 33 cell rectangle, 
where the category “developed land” was counted as urban. Then, 
percent of urban spaces data in 33 x 33 30 m cell rectangle sur- 
rounding each EPA monitor were calculated. 

2.3.2. Elevation 

Elevation data above sea level in meters were obtained through 
the national elevation dataset (NED) (Maune, 2007). NED is 
distributed by the U.S. Geological Survey (USGS) and provides 
seamless raster elevation of the conterminous United States. A 30 m 
resolution raster was created from 1/3 arc second data using 
ArcGIS. 


2.3.3. Normalized Difference Vegetative Index 
Satellite-derived Normalized Difference Vegetation Index 

(NDVI) based on red and near infrared (NIR) reflectances a temporal 
indicator of the vegetation cover and its phenological state (Tucker, 
1979; Tucker and Sellers, 1986): 

NDVI = (p NIR - p RED ) /(p NIR + p™ 0 ) (1) 

We used the 1 km MODIS monthly vegetation indices product 
(MOD13A3). Except for periods of spring green-up and fall senes- 
cence associated with seasonal surface change, the NDVI can 
generally be considered relatively constant on a monthly basis. 
Many other studies suggested that NDVI was found to be good 
predictors for local means of pollutant concentrations (e.g. Su et al., 
2009). 

2.3.4. Distance to PM 2.5 point source emissions 

PM 2.5 point source emissions were obtained through the 2005 
US EPA National Emissions Inventory (NE1) facility emissions report 
(EPA, 2010) which includes large sources that emit more >100 tons 
per year. The distance (in km) from the AOD grid centroid to the 
nearest point source emission (from the EPA emissions dataset, 
tons per year) was calculated in ArcGIS. Out of the 62 monitors used 
in the analysis, 6 were within 150 m, 19 were within 550 m and 26 
were within 2—10 km of an industrial source, respectively. 

2.3.5. Road density 

Major roads were selected from ESRI Street Map data provided 
with ArcGIS (version 10) using census Feature Classification Code 
(FCC) (e.g. Al: primary highway with limited access, A2: primary 
road without limited access, A3: secondary and connecting road). 


192 


A.A. Chudnovsky et al. / Atmospheric Environment 89 (2014) 189-198 


Road density was calculated using ArcGIS to create a line density 
raster at a 1 x 1 km resolution to match the AOD grid cells. We used 
all Al— A3 roads in our study region. Thus each AOD gird cell was 
assigned the density (e.g. road density) in each grid based on the 
length of intersecting roads in each grid. Because the distributions 
of major roads were highly right-skewed, we used a logarithmic 
transformation. 

2.3.6. Temporal predictors: meteorological data 

All meteorological variables used in the analysis (wind speed 
(WS), relative humidity (RH)) were obtained through the national 
climatic data center (NCDC) (NCDC, 2010). There were 44 active 
MET stations across New-England during the study period. Daily 
meteorological (MET) data was used in the analysis. Using ArcGIS 
we assigned to each AOD grid the closest MET station in our dataset 
(based on multiple MET data sources including EPA and The Na- 
tional Climatic Data Center (NCDC)). Height of the planetary 
boundary layer (PBL) was obtained from the NOAA Reanalysis Data. 

2.4. Statistical model 

We apply a mixed effects model approach to MAIAC AOD re- 
trievals and other meteorological and LU variables to predict PM 2.5 
concentrations in each grid cell. This model allows for the regression 
intercepts and slopes to vary daily in order to account for the inherent 
day-to-day variability in the AOD-PM 2.5 relationship. Furthermore, 
since New England is a relatively large area and PM-AOD relationship 
can vaiy spatially, we partitioned our study area into three sub- 
regions (Fig. 1) and allowed for the daily AOD-PM 2.5 slopes to vary 
by region. Region 1 included ME, VT and NH states, region 2 included 
MA while CT and RI formed region 3 in our analyses. Although there 
are some variations among the three regions in topography and 
climate conditions mostly via the usual north-south snow cover 
gradient in winter, the main difference appears at the level of ur- 
banization and land use affecting surface brightness and thus the 
AOD vs PM 2.5 relationships (Chudnovsky et al., 2013b) and the quality 
of aerosol product. For instance, validation analysis of MOD1S 3 km 
product (Munchak et al., 2013) showed a strong correlation between 
percent of retrievals above expected error and percent of the urban 
land cover. A similar investigation is ongoing for MAIAC. Of the three, 
region 1 is least urbanized with high fraction of forest cover and re- 
gion 2 is most urbanized. Thus, by dividing the study area into regions 
we can evaluate the role of environmental conditions (e.g. snow 
coverage) and different land use settings on AOD-PM 2.5 relationship. 
We used the following mixed effects model with random intercepts 
and slopes that was applied on AOD retrieval days/pixels (Eq. (2)): 

PMy = ^0! + Uj 4- gjireg) j + [(^1 "h Vj + hjtregf j x 

+ ^Elevation + (1 3 NDVI + 0 4 WS + 0 5 Urban 

+ /36bog(RoadDensity) + f3 7 PBL + /Sg Humidity 

+ /SgDistanceEmission + (l w WS*PBL 

+ /3 11 Log(RoadDensity)*PBL + /3 12 AOD 2 + ey (2) 

( u j v j ) ~ [(0 0)’ 5 >] 

(gj(reg)hj (reg)) ~ [(° °), JZ] 
reg 

where PMy is the PM 2.5 concentration at a spatial site i on day j; 
AODy is the AOD value in the grid cell corresponding to site i on day 
j ; a and Uj are the fixed and random intercepts, respectively; 0] and 


Vj are the fixed and random slopes, respectively; Wind speed (WS), 
Humidity, PBL (Planetary Boundary Layer) are the values in the grid 
cells corresponding to site i on a day j. Elevation, road density 
(RoadDensity), distance to emission sources (DistanceEmission), 
percent of urban space (Urban) are the values in grid cells corre- 
sponding to site i. gj(reg) and hj( reg ) are the daily random intercepts 
and AOD slopes specific to each study area region, ey ~ N(0, a 2 ) is 
the error term at site i on a day j and Sjj is the variance-covariance 
matrix for the random effects. The AOD fixed effect in the model 
(Eq. (2)) accounts for the effect of AOD on PM 2 . 5 , which was the 
same for all study days. The AOD random effects explain the daily 
variability in the PM 2 . 5 -AOD relationship. Since height of the 
boundary layer may vary with wind speed (Oke, 1987), influencing 
the concentration and vertical profile of pollutants, both terms 
were included as interaction terms. For example, boundary layer 
not only controls transport and location of pollutants and aerosols 
but also their concentrations would be different in variable 
boundary layer structures (Angevine et al., 2013). The solution of 
the mixed model equations is maximum likelihood, a form of 
estimation that accounts for the parameters in the fixed-effects 
structure of the model to reduce the bias in the covariance 
parameter estimates (Lindstrom and Bates, 1988; Laird and Ware, 
1982). Currently, this method is implemented in the SAS (Statisti- 
cal Analysis System) statistical software package version 9.3 (proc 
mixed). 

In addition, we incorporated inverse probability weighting 
(IPW) to potentially avoid bias in the regression coefficient esti- 
mates and thus in the resulting predictions. This approach effec- 
tively up-weights dates and grid cells which are under-represented 
due to missing data, as described in Kloog et al., 2012. Finally, PM2.5 
concentrations for each grid cell on a dayj were estimated using the 
corresponding AOD values where the fixed and random intercepts, 
the fixed and random slopes for each study day and for each region 
were derived from Eq. (2). 

Importantly, in this study we compared the full model above 
with reduced models to ascertain the benefits of increasing 
complexity. The models examined were: 1) AOD only model; 2) 
AOD + MET model; 3) AOD + LU model; and 4) AOD + LU + MET. 

2.5. Model validation 

We use a cross-validation (CV) approach to evaluate the ability 
of the model to predict PM2.5 concentrations for each pixel in the 
study area. Thus, the dataset was repeatedly randomly divided into 
90% (calibration) and 10% (held-out test) splits. We applied the 
fitted calibration model to estimate PM2.5 for the held-out test set. 
This “out-of-sample” process was repeated ten times to calculate 
the cross-validated (CV) R 2 values. Subsequently, the predicted 
PM2.5 concentrations were compared to those measured at each 
site. Overall temporal R 2 was calculated by regressing APM against 
A predicted where: APM is the difference between the observed 
PM2.5 at a given site on a given day and the annual mean PM2.5 at 
that location, and A predicted is defined similarly for the predicted 
values generated from the model. By averaging our estimated daily 
exposures at each location we can assess the accuracy of our model 
for long term exposures. This enables us to study both the short 
term and long term effects of ambient particles, respectively. 
Overall spatial R 2 (or cross-sectional comparison) was calculated by 
regressing the annual mean PM2.5 at a given site against the annual 
mean predicted PM2.5 at that location. 

2.6. Estimation of PM 2 .s levels and their spatial variability 

We examined spatial PM2.5 patterns in New England using the 
mixed effects models described above. We focus here on five days 
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based on the MAIAC AOD retrieval from Aqua in 2003 : (i) a medium 
to high pollution event (regional source of pollution): June 25, and 
(ii) low pollution events (local sources): April 25, May 19, July 31 
and November 15. We focus in more detail on the June 25th and 
November 15th data to analyze the consistency and quality of the 
high-resolution AOD retrievals. In particular, it is known that the 
MODIS dark target algorithm (Levy et al., 2007) has a bias over 
brighter urban surfaces (e.g., Munchak et al., 2013), and this com- 
parison has been performed to assure that MAIAC algorithm has 
reduced this error. 

3. Results 

3.1. PM 2.5 prediction based on a mixed-effects model 

Using the entire dataset, estimates of PM 2.5 concentrations were 
obtained for 196 days (e.g. days with available AOD vs PM 2.5 pairs) 
during 2003. The retrieval rate is lower for higher numbers of AOD 
vs PM 2.5 pairs: there were 95 days with at least five pairs and 44 
days with at least 10 pairs. Note that during the same period of 
observations but using the conventional 10 km AOD retrieval (DT) 
for the entire dataset there were 177 days available (Kloog et al., 
2011 ). 

The fixed effects of the AOD intercept and slope were statisti- 
cally significant: a = 8.91 (p < 0.0001 ); and /?i = 16.20 (p < 0.0001 ), 
respectively. The fixed effects of spatial and temporal predictors 
were also significant. In addition, the random slopes for AOD by 
day, and by day and region were both significant (p < 0.0001 ). Fig. 2 
shows the daily variation of random AOD intercepts and slopes. 
Note that these results (significant random effects) support the 
hypothesis that because the parameters influencing the relation- 
ship between PM2.5 and AOD vary from day to day within a given 
domain, it is necessary to adjust for this daily variability. Table 1 
presents modeled and cross-validated R 2 for 2003 for four 
different models: the AOD model, AOD + MET model, AOD + LU 
model and our final model: AOD + LU + MET. As can be seen, 
adding the LU terms to the AOD model significantly improves the 
spatial R 2 , and the full model, incorporating LU and MET, improves 
it further. In contrast, the AOD model is sufficient to well charac- 
terize the temporal variation. The final CV R 2 resulted in an R 2 value 
of 0.88 with a spatial R 2 of 0.80. In addition, if we regress the out of 
sample measured PM2.5 against the predicted we get a measure of 
bias in the relationship. We obtained a slope of 0.99, and intercept 
of 0 . 01 , indicating a very low (or negligible) bias in the prediction 
model. Furthermore, the increase in spatial R 2 (from R 2 — 0.58 
when the main explanatory variable was AOD to R 2 = 0.80 when all 
parameters were incorporated in a model) is of special importance 
since spatial variation is essential for chronic exposures studies. We 
can contrast this to previous results of Kloog et al. (2011), with 
overall cross-validated R 2 of 0.83 in New England, with a spatial R 2 
of 0.78 and a temporal R 2 of 0.84 when using 10 km AOD. Hence use 
of the MAIAC data improves overall, temporal, and spatial R 2 . 

Fig. 3 shows the seasonal mean residuals per EPA site. As can be 
seen, there is a high prediction accuracy for most of sites and for all 
seasons, with the mean yearly residual value of 1.71 pg/m 3 (stan- 
dard deviation of the mean = 1.20 pg/m 3 ). The fall season shows the 
lowest mean residual value of 1.27 pg/m 3 (stdev = 1.12 pg/m 3 ) 
whereas the winter has the highest mean of 2.48 pg/m 3 
(stdev — 2.21 pg/m 3 ). This is due to the higher AOD retrieval error 
during winter (undetected residual snow) and lower AOD retrieval 
rate. Note that one site (EPA code 33-007-4002) located in the 
forest on a state highway was excluded from our analyses (denoted 
as 1, Fig. 3), since it had the highest average residual compared to 
other sites. There were only a limited number of PM 2.5 measure- 
ments which differed significantly from those observed at the 



Slope 



Intercept 

Fig. 2. Frequency distribution of the random intercepts and slopes. 

nearby sites. Located on the east of Mount Washington, we also 
assume that the nearby upwind peaks affect this site when the 
wind is from the prevailing wind direction. 

3.2. Spatial variability in PM 2.5 levels during moderate and low 
pollution events 

We compare the spatial variability in PM 2.5 levels using the AOD 
model (panel A, Fig. 4) and AOD + LU + MET model (panel B, Fig. 4) 
during the moderate pollution day on June 25. Generally, both 
models show similar spatial pattern. Importantly, this pattern of 
pollution transport can only be captured by using satellite AOD 
retrievals and cannot be estimated solely based on LU and MET 
parameters. Furthermore, both models exhibit variability in PM 2.5 
concentrations across the domain. However, this variability is much 
larger when using the AOD + LU + MET model. For example, 
consider the variability in PM 2.5 concentrations inside of areas 
marked 1 and 2. The AOD + LU + MET model shows not only higher 
variations per unit area but also higher predicted concentrations of 
PM 2.5 along the highway (Route 495, area 2). This area corresponds 
to the Merrimack River Valley, so elevation may be a factor as well 
as industrial development along the river. The lower left part of it 
corresponds to the historically industrial city of Lawrence whereas 


Table 1 

Statistical parameters obtained for the calibration and the prediction stages for each 
of the model. 


Model 

Modeled R 2 

Cross- 
validated R 2 

Spatial R 2 

Temporal R 2 

AIC 

AOD 

0.899 

0.848 

0.582*** 

0.881*** 

6885 

AOD + MET 

0.899 

0.849 

0.563*** 

0.885** 

6865 

AOD + LU 

0.917 

0.874 

0.764*** 

0.892*** 

6662 

AOD + LU + MET 

0.922 

0.879 

0.795*** 

0.892*** 

6640 
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Fig. 3. Seasonal mean residuals per EPA site. 


to the west, elevated PM2.5 concentrations correspond to the in- 
dustrial city of Lowell. Importantly, the area marked 1 in the figures 
shows uniformly high concentrations in the AOD model, but a more 
mixed pattern, with pockets of lower concentrations and a few 
places with even higher concentrations in the AOD + LU + MET 
model. 

What governs spatial variability in PM 2.5 levels during low 
pollution day? To answer this question in Fig. 5 we show several 
days generated by AOD + LU + MET model: April 25, May, 19 and 
July, 31. For all, PM 2.5 concentrations differ by date and location. As 
expected, highly populated areas such as Bridgeport, New Haven, 


Hartford, Boston, Springfield and Providence exhibited higher 
PM 2.5 levels, compared to rural areas of Vermont and southwestern 
New Hampshire. Furthermore, grid cells along major highways 
(e.g., Interstate Highways 91 and 95) tend to have higher PM 2.5 
concentrations, perhaps because these cells are more impacted by 
traffic and are also densely populated. On May 19, urban areas 
(Providence, Boston, and New Haven), roads, and the area along the 
coast line (CT) exhibited higher PM 2.5 concentrations. Furthermore, 
there is a higher variability in PM 2.5 concentrations between urban 
and rural locations. Note however, that for all days central Boston 
appears to have variability in PM 2.5 levels. 



Fig. 4. PM2.5 concentrations during a moderate pollution day, June 25, 2003 with the zoom at greater Boston. Panel A: PM 2 .5 concentrations derived using AOD model. Panel B: 
PM2.5 concentrations derived using AOD + LU + MET. 
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Fig. 5. Sequence of maps showing patterns of PM 2 .s concentrations during different low pollution days. Panels A, B and C are spatial PM 2 5 patterns on April 25, May 19 and July 31, 
2003, respectively. 


To rule out algorithm biases with surface brightness, we present 
an additional AOD map of a clear day acquired on November 15, 
2003 (Fig. 6 , panel A). As can be seen, the MAIAC retrieval is free from 
this artifact of processing above urban areas. This day also served as 
an example of the most extreme difference in PM 2.5 spatial pattern 
using AOD ( Fig. 6 , panel B) models and AOD + LU + MET (Fig. 6 , panel 
C) on low pollution day. Not surprisingly, using AOD model, the 
spatial pattern of PM 2.5 concentrations follows AOD retrievals. 
Furthermore, the variability in PM 2.5 concentrations between 
different locations on this day is 1.5 pg/m 3 . Importantly, as can be 
seen from Fig. 6 (B), the AOD model is prone to overestimate PM 2.5 
concentrations in areas adjacent to clouds. This may represent a 
residual noise from cloud-contaminated pixels as well as a real 


physical signal, for example, presence of elevated hydrated aerosols 
in the vicinity of cloud. When an AOD + LU + MET model was run, 
urban areas and roads appears to be more polluted than sur- 
rounding non-urban areas, with the highest difference in PM2.5 
concentration levels of 4.5 pg/m 3 for the entire domain and between 
different urban settings, or three times the variation of the AOD 
model. The results of the AOD + LU + MET model were supported by 
ground EPA measurements: the seven sites that were available 
on that day measured PM2.5 concentrations ranging from 2.3 to 
8.9 mg/m 3 , with a difference of 6.6 pg/m 3 . 

Finally, in Fig. 7 we compared annual mean measured and 
predicted PM2.5 concentrations using AOD and AOD + LU + MET 
models for all sites located along major highways (191, 193 and 195). 



Fig. 6. Panel A is map of AOD retrievals during November 15, 2003. Panels B and C are PM 2 5 concentrations during the same day using different approaches: AOD model (B) vs 
AOD + LU + MET model (C). See text for explanation. 
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Generally, the AOD + LU + MET model generates better accuracy 
(R 2 = 0.82) for those sites than the AOD model (R 2 = 0.61 ). Using the 
AOD model we tend to underestimate the PM 2.5 levels from road 
pollution. 

3.3. Model accuracy at the city scale 

High resolution AOD may provide information about local 
conditions and intra-urban variability, at scales below 10 km. With 
this in mind, we analyzed the accuracy of our model inside the 
greater Boston area and New Haven (each city contains five ground 
monitoring sites) to examine performance within cities. Fig. 8 
shows that using the AOD + LU + MET model there is a good 
agreement between measured and predicted PM 2.5 concentrations 
for both locations. As can be seen, the model and cross-validated R 2 
are high for both cities indicating a good agreement between the 
measured and predicted concentrations. Furthermore, cross- 
sectional comparison between the mean measured and mean 
predicted PM 2.5 concentrations for each site over the study period 
for Boston and New Haven R 2 are 0.80 and 0.87 respectively (data 
not shown). This comparison is especially important for deter- 
mining whether model predictions are suitable assessments for 
epidemiological studies, which require accurate estimation of 
spatial patterns. This improvement can be related to the improved 
MA1AC accuracy over bright/urban areas that has a direct impact on 
the model accuracy. 

4. Discussion 

In this paper we use the new high-resolution (1 km) AOD 
retrieval from MODIS data based on MAIAC algorithm to predict 
PM 2.5 concentrations within the New England area of the United 
States. The main goal was to study if the high resolution AOD can 
improve our ability to distinguish qualitatively and quantitatively 
spatial patterns of PM 2.5 levels. Toward this end we developed 
mixed effects model to explore the advantages of high resolution 
dataset. Importantly, we have shown that PM 2.5 prediction accu- 
racy improves further by adding meteorological and land use pa- 
rameters. We have shown that using the MAIAC data we obtain 
better predictive power than with the DT data, temporally, spatially, 
and overall. While an overall improvement in R 2 of 3% explained 
may seem modest, we were starting from a high baseline (83% 
explained) and this represents 17% of the remaining unexplained 
variance in concentration. Moreover we have shown that this 
model specifically better captures the effects of urban highways, 
and high frequency spatial variability. 

From the epidemiological and exposure assessment point of 
view, it is of high importance to have information about the 
spatial variability of the exposures in the city. In a previous study 



Kumar et al., 2011 employed the same DT algorithm to retrieve 
the 2- and 5-km AOD resolutions, which were used to predict 
PM2.5 concentrations. The authors reported that their model was 
less accurate in urban locations than the suburban ones. In 
contrast, our study showed high accuracy in selected urban lo- 
cations (Boston and New Haven) thereby indicating that our 
model based on MAIAC data can be used to investigate the 
intra-urban exposure contrasts in PM2.5 levels. Furthermore, 
Kloog et al. (2012) employed a similar model for the Mid-Atlantic 
region but using the coarser DT retrieval. While overall model fit 
was good (cross validated R 2 of 0.81) the authors reported oc- 
casional low PM2.5 concentrations around the major Mid-Atlantic 
Highway (the 1-95) presumably because the relatively coarse 
10 x 10 km 2 grid cell for AOD cannot always capture the con- 
centration gradients near line or point sources. On the contrary, 
Lee et al. (2012) showed that the ground based PM2.5 network 
supplies more accurate prediction for much of the USA that 
10 km AOD. Our study shows that the final constructed model 
with high resolution AOD data capture the pollution levels along 
highways and many other urban locations enabling therefore to 
assess spatial variation within cities. Importantly, these high 
concentrations are not artifacts of retrieval, thereby extending 
our ability to investigate the spatial patterns of urban particulate 
pollution, such as examining exposures in areas with high traffic. 
The direct implementation of our results will outcome in more 
accurate accounting for the magnitude of the association be- 
tween PM2.5 and health outcomes. Finally, additional parameter 
that should be considered in the future modeling of PM2.5 is 
traffic counts from National Transportation Atlas. Unfortunately, 
this data was unavailable for 2003. 

Although the 1 km resolution is still far from optimal, it offers a 
clear advantage over the 10 km and even 3 km AOD data in urban 
studies. First, the improved resolution is expected to not only 
reduce the exposure error but also generally result in larger health 
effects estimates. For example, fine-scale variations in PM2.5 have 
been shown to associate with larger health effects than those that 
vary regionally (Jerrett et al., 2005, 2009), suggesting the potential 
importance of refining exposure predictions. However, it should be 
also noted that recently-developed statistical approaches use land 
use information to get within grid spatial variation at finer than 
1 km scales (Beckerman et al., 2013; Vienneau et al., 2013; Sampson 
et al., 2013) which is potentially a complimentary approach to what 
has been done in our study. 

Despite promising results, more data need to be pre-processed 
and analyzed. First, our model was developed for AOD retrieval 
days/pixels and the next study should expand previously devel- 
oped methodology described in Kloog et al., 2011 aimed to assess 
PM2.5 concentrations on non-retrieval days. Furthermore, to further 
investigate the strengths and limitations of high resolution AOD 



Fig. 7. Mixed effects model performance using AOD model and AOD + LU + MET model as assessed by annual mean measured and annual mean predicted PM 2 .5 concentrations for 
sites along major highways (191, 193 and 195). 
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Fig. 8. Mixed effects model R 2 and cross-validated R 2 (10% of data) performance as assessed by measured and predicted daily PM 2.5 concentrations) for Boston (panel A) and New 
Haven (panel B). 


data for modeling PM 2.5 concentrations we are planning a 
comprehensive multi-year study based on the full set of MOD1S 
measurements. Next, further improvements to the MAIAC AOD 
retrieval algorithm would improve accuracy in PM 2.5 estimation. 
For example, lack of vertical information highlights the importance 
of combining the satellite image with vertical profiles, like LIDARs. 
It should be also noted that this approach requires a large amount 
of daily PM 2.5 stations, which are not always available in any given 
region. Therefore, the developed model would not be directly 
transferable for areas without sufficient PM 2.5 monitors, such as 
Africa or Latin America. Another limiting factor is the lack of PM 2.5 
measurements in areas of high spatial contrast due to relatively 
smooth EPA monitors location. 
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